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In this paper we study the correlations that arise when two separated parties perform measure- 
ments on systems they hold locally. We restrict ourselves to those correlations with which arbitrarily 
fast transmission of information is impossible. These correlations are called nonsignaling. We allow 
the measurements to be chosen from sets of an arbitrary size, but promise that each measurement 
has only two possible outcomes. We find the structure of this convex set of nonsignaling correla- 
tions by characterizing its extreme points. Taking an information-theoretic view, we prove that all 
of these extreme correlations are interconvertible. This suggests that the simplest extremal nonlo- 
cal distribution (called a PR box) might be the basic unit of nonlocality. We also show that this 
unit of nonlocality is sufficient to simulate all quantum states when measured with two outcome 
■ measurements. 

o 
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CN , I. INTRODUCTION 

a . 

, ^ ' Measurements on parts of quantum states held by spatially separated parties cannot be used for 

superluminal signalling; in this respect quantum mechanics is a nonsignaling theory. John Bell 0] 
exposed a novel feature of the theory when he considered a gedanken experiment of the following form: 
two separated parties, Alice and Bob, locally measure two physical systems which were, at an earlier 
time, very close together. Bell found quantum states which display measurement outcome statistics 
^ ' which vary, as Alice and Bob change their measurements, in a way which cannot be explained by only 

^SJ , assuming an exchange of classical information when the two systems were close together in the past. 

OO ' This behavior is termed quantum nonlocality and has partial experimental validation (for a discussion 

of experimental tests and loopholes see 2,] and references therein). 

Quantum mechanics is not the only conceivable theory that predicts correlations which, though they 
are nonsignaling, cannot be understood as having been established in the past. This paper investigates 
. the structure of the set of all possible nonsignaling correlations and attempts to characterize these in 

information-theoretic terms. 

Since quantum mechanics is so successful in its predictions, it might seem unusual to consider other 
theories, with different kinds of correlations, which are not physically instantiated. There are practical 
and foundational physical motives, as well as information-theoretic reasons, for considering a broader 
class of correlations. 

^ I Motivated by the technological promise of quantum information, there is a drive to understand the 

O^' origins of quantum features which may have concrete applications. They could be direct consequences 

of the fact that quantum mechanics is a nonsignaling theory or, alternatively, exploit other features 
of the theory. Such concerns motivated the information-theoretic treatment of nonlocal correlations in 
' 01 • second reason for interest in general nonsignaling theories is foundational; given that quantum 

^ I mechanics is a nonsignaling theory, what simplest possible extra features must be added to explain the 

results of our experiments? Popescu and Rohrlich Q show that there exist nonsignaling correlations 
which cannot be reproduced by quantum mechanics: why is quantum mechanics not the most general 
kind of nonsignaling theory, what further constraints does it satisfy? 

In the context of communication complexity and cryptography, interesting results have come from 
considering nonsignaling correlations. Van Dam |5(| showed that, equipped with 'superstrong nonlocal' 
correlations, all bipartite communication complexity problems are rendered trivial (requiring only one 
bit of communication) . There has also been work relating bit commitment to nonsignaling p, 0, B- In 
cryptography, it is best to have security proofs that rely on a minimum number of principles; in j9j, a 
key distribution scheme is presented which can be proved secure by only assuming nonsignaling. 

Our work follows that of Barrett et al. Q . They characterize the bipartite nonlocal correlations arising 
when Alice and Bob can perform one of two measurements, each with an arbitrary outcomes. They also 
provide results on the interconversion of correlations and consider the case of more than two parties. 

In this paper, we consider the set of bipartite nonsignaling correlations, where each party performs 
one from an arbitrary set of measurements and each measurement has two possible outcomes (a reversal 
of the situation in |3). This set is a convex polytope and we characterize it in terms of its extreme 
points. The structure of these extreme points has already been used by one of the authors ^3 show 
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that, for all nonsignaling theories, the more incompatible two observables are, the more uncertain their 
corresponding outcomes. From an information-theoretic perspective, we also prove that all nonlocal 
extremal correlations are interconvertible, in the sense that given sufficient copies, any one can simulate 
any other. Consequently, any nonlocal extremal distribution can simulate any non-extremal one. The 
simplest extremal nonlocal correlations are called Popescu-Rohrlich (PR) boxes . One can thus consider 
the PR box as the unit resource of bipartite nonlocal correlations, in the same fashion as the singlet is 
considered the unit resource of quantum correlations. It is, as yet, unclear if they can serve as s sufficient 
unit in more general cases. Since quantum correlations are nonsignaling j al l those within the polytope 
considered can be simulated by PR boxes. It has previously been shown [ij that all possible projective 
measurements on the singlet state of two qubits can be simulated using just one PR box and shared 
randomness (our result can be seen as an extension from projective measurements on singlets to POVMs 
on general bipartite quantum states). 

This paper is structured in the following way. In Section II the set of nonsignaling correlations is 
characterized in terms of inequalities. Section III reviews past results and characterizes the structure of 
this set in terms of its extreme points. Section IV is devoted to the inter-convertibility of nonsignaling 
correlations. Section V concludes and shows, by giving an example, that the extreme points in more 
general cases have nonuniform marginals and thus lack the simple structures found in Section III and in 



II. NO-SIGNALLING CORRELATIONS 



In what follows, we will consider two parties — Alice and Bob — each performing space-like separated 
operations. Each possesses a physical system which can be measured in several distinct ways and each 
measurement can yield several distinct results. Let x (y) denote the observable chosen by Alice (Bob) 
(these will also be called inputs), and a (b) be the result of Alice's (Bob's) measurement (these will also 
be called outputs). The statistics of these measurements define a joint probability distribution for the 
outputs, conditioned on the inputs, Pab\xyj which satisfies the usual constraints: 

Pab\xy > V a,b,x,y, (1) 

^Pab\xy = 1 Vx,?/. (2) 

a, 6 

We consider the input x (y) to take values from an alphabet of length dx {dy), that is, x G {0, dx — 1} 
and y £ {0, dy — 1}. The output a (b) takes values from an alphabet of length da (dfc), a G {0, da~l} 
and b e {0, ...,4 - 1}. 



A. No-signalling constraints 

The requirement that Alice and Bob cannot signal to each other by using their correlations is equivalent 
to the condition that Alice's output is independent of Bob's input, Pa\x is independent of y (and vice- 
versa): 

^Pab\xy ^^Pab\xy' y a,X,y,y', (3) 
b b 

^Pab\xy = ^Pab\x'y b,X,x',y. (4) 
a a 

For fixed dx, dy, da, and d^, the set of probability distributions Eqs. |2Jl is convex and has a finite 
number of extreme points. In other words, it is a convex polytope. It is known that the intersection of 
a polytope with an affine set, like the one defined by the no-signaling constraints defines another 

convex polytope. From now on, all distributions are assumed to belong to this set. In this paper such 
distributions are represented by tables of the form given in Table I. 
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X 

y 





1 




dx - I 





-Poo|()o -Pioloo 
-Poijoo Al|00 


-Pool 10 -Pioiio 
-Poi|io -Pll|10 




PoO|dx-l,0 -Pl0|djc-1,0 
Poi|dx-l,0 Al|dx-1,0 


1 


^00|01 -Pioioi 
-"OllOl -f^iiioi 


-Pool 11 -PlOjll 

p p 

-"01111 -f^lllll 




PoO|dx-l,l PlO|dx-l,l 

P P 

-"Olldi. -1,1 -'llldi.-l.l 












dy-l 


-PoO|0,da-l -PlO|0,dj,-l 
-Poi|0,da-l -Pll|0,d„-1 


PoO|l,d„-l -PlOjl.dj,-! 
Poi|l,da-l Pll|l,d„-1 




PoO|dx-l,dj,-l PlO|di,-l,d„-l 
Poi|dx-l,d„-l Pll|di-l,d„-l 



TABLE I: This table represents a general probability distribution for dx and dy arbitrary and da = db = 2. The 
distribution is broken into dx x dy cells, with one cell for every input pair {x, y). Each cell specifies the probabilities 
of the four possible outcomes given these inputs (these must sum to one). The nonsignaling conditions require, 
for example, that Poo|oo + -Poi|oo = -Poo|oi + -Poi|oi and that Poi|oo + Pii|oo = Poi|d,.-i,o + Pii|dx-i,o- 

B. Local correlations 

Local correlations are those that can be reproduced by parties equipped only with shared randomness. 
These are a proper subset of nonsignaling correlations. One can always write them as: 

Pab\xy — ^ ^^ PePa\xePb\ye- (5) 

e 

A protocol for generating Pab\xy is the following: With probability Pe Alice (Bob) samples from the 
distribution Pa\xe {Pb\Ye)- It is known that the set of local correlations is a convex polytope with some 
of the facets being Bell-like inequalities The extreme points of this polytope correspond to local 

deterministic distributions, that is Pab\xy = 5aj(x)5b,g(y), where f{x) and g[y) map each input value to 
a single output value. Correlations that are not of the form (O are called nonlocal. 

C. Quantum Correlations 

Quantum correlations are generated if Alice and Bob share quantum entanglement. These can be 
written as: 

Pab\xy=tr[F:®Fypl (6) 

where {Fq , ...,F^^_-^}, {Fq , ...,F^^_^}, are positive operator valued measures for each x and y, and p is 
a density matrix. Though this set is convex, it is not a polytope; it includes all local correlations and 
also probability distributions which are nonlocal. It is, however, smaller than the full set of nonsignaling 
correlations. This was proved in (3| by providing an example of a nonsignaling distribution forbidden by 
quantum mechanics. 

III. EXTREME NONSIGNALING CORRELATIONS 

The full set of extremal distributions for the general situation where d^, dy, da and db is not yet 
characterized. In what follows, previous work considering the case where da = db = dx — dy — 2 and the 
case for dx = dy — 2, and both da and db arbitrary will be reviewed [Mll3||. Next, the extreme points for 
da = db = 2 and both d^ and dy arbitrary will be presented. 

A. Reversible local transformations 

Applying reversible local transformations to a distribution does not change its nonlocal properties. 
We say that two distributions are equivalent if one can be transformed into the other by means of local 
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reversible transformations. Identifying classes of extremal distributions which are equivalent simplifies 
the task of categorizing all of them. It is sufficient to quote one representative element from each 
equivalence class. Let us list all possible local reversible transformations: 

• Permute the ordered set of input values for each party, (0, 1, . . . — 1) and (0, 1, . . .dy — 1). 

• Permute the ordered set of output values depending on the input. To indicate that a and b 
are associated with the particular inputs {x,y), the notation ax and by will sometimes be used. 
Summarizing, one can apply a different permutation to each of the a^ (by), for each value of x (y). 



B. Binary inputs and outputs 



All extremal nonlocal distributions for the case d^ = dy = da = di, = 2 are equivalent to: 



X 







1 




y 













1/2 





1/2 










1/2 





1/2 


1 


1/2 








1/2 







1/2 


1/2 






(7) 



(this format is explained in Table I) or alternatively: 




, : a + b mod 2 = 

Pab\xy=-{ r. ■ (8) 

: otherwise, 

where it is understood that a and b are locally uniformly distributed. This distribution is also called a 
PR box and constitutes the paradigm of nonlocality. PR boxes have their outputs together, depending 
on their inputs together; but they are nonsignaling since their outputs are (locally) random, obeying 
Eqs. iTTO . 



C. Binary inputs and arbitrary outputs 

Barrett et al. provided the following characterization for the case where dx = dy — 2 and arbitrary 
da, db outcomes. Each inequivalent extremal nonlocal distribution is characterized by one value of the 
parameter k G {2, . . .mhi{da,db)}. For each k, its corresponding distribution is 

{1/fc : (6 — a) mod k — xy 
(9) 
: otherwise, 

where a, 6 G {0, . . . , A: — 1} and are locally uniformly distributed. Note that Eq. JH)) is recovered when 
da = db ^ 2. 



D. Arbitrary inputs and binary outputs 

In what follows, one of the main results of our paper is presented. We give a characterization of all 
extreme distributions for the case where dx and dy are arbitrary, and da = db = 2. The proof of this 
result is provided in the appendix. 

Result 1: Table II provides at least one representative element of all classes of extremal correlations 
for a given dx and dy. Each of these distributions is characterized as follows: 
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1. Giving two integers Qx and gy, where gx G {2, 3, • • • dx} and Qy e {2, 3, • • • dy} if the distribution is 

nonlocal, and gx = Qy = ^ if the distribution is local. 

2. And assigning perfect correlation or anti- correlation to all the cells with a question mark '?', that 
is 



? 




1/2 





or 





1/2 









1/2 




1/2 






(10) 



X 

y 





1 


2 




Src - 1 


Qx 




dx - I 





1/2 
1/2 


1/2 
1/2 


1/2 
1/2 




1/2 
1/2 


1/2 
1/2 




1/2 
1/2 


1 


1/2 
1/2 


1/2 
1/2 


? 




? 


1/2 
1/2 




1/2 
1/2 


2 


1/2 
1/2 


? 


? 




? 


1/2 
1/2 




1/2 
1/2 






















1/2 
1/2 


? 


? 




? 


1/2 
1/2 




1/2 
1/2 


gy 


1/2 1/2 



1/2 1/2 



1/2 1/2 





1/2 1/2 



1 





1 





















dy - 1 


1/2 1/2 



1/2 1/2 



1/2 1/2 





1/2 1/2 



1 





1 




TABLE II: This table gives a representative element of all classes of extreme points, where Alice (Bob) has 
dx (dy) different input settings, and Qx (Qy) of them arc nondctcrministic. Cells containing a '?' can cither be 
perfectly correlated (like the cell corresponding to a: = j/ = 0) or anti-correlated (like the cell corresponding to 
x = y = l). 

As one can sec in Table II, each party has two kinds of input settings: (i) the deterministic ones 
{x > gx for Alice) have a fixed outcome, (ii) the nondeterministic ones [x < Qx for Ahce) have uniform 
probabilities for their corresponding outcomes, Po|x = Pi\x = 1/2- There are gx nondeterministic 
input settings and dx — gx deterministic input settings in Alice's site and analogously for Bob. The 
representative distributions are chosen to have the outcomes for all deterministic input settings fixed to 
'0'. 

The following observation will prove crucial. When the distribution is nonlocal, that is gxiQy > 2, 
there is always a PR box structure when both parties restrict to x, y G {0, 1}. 

Extreme points for which gx = dx and gy = dy can be algebraically characterized by: ax + by = 
+ Z^(jj)eQ 5x,i^v,j mod 2. Here Q is any subset of the set {1, dx} x {1, dy] - {(1, 1)}. 

IV. INTERCONVERSION OF NONLOCAL CORRELATIONS 

In this section we prove that all extremal nonlocal correlations with binary outputs can be intcrcon- 
verted. This means that all contain the same kind of nonlocality. By saying that the distribution Pab\xy 
can be converted into P'^^^xy mean: given enough copies (realizations) of Pab\xy, Alice and Bob can 
simulate the statistics of P'^^xy ^^^^ value of x and y that they independently choose. We assume 
that the two parties can perform local operations and have unlimited shared randomness. This is a fair 
assumption because with these resources (shared randomness and local operations) we cannot create 
nonlocality. 
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Result 2: All nonlocal extremal correlations with arbitrary anddy, and binary output (da = dh = 2) 
are interconvertible. 

In order to prove this statement, we first argue that all extremal nonlocal correlations can simulate a 
PR box, and second, we prove that PR boxes are sufficient to simulate all extremal distributions. By 
recalling that all distributions can be written as probabilistic mixtures of extreme points (noting that 
such mixtures can be reproduced by shared randomness), one can also make the following statement: 

Result 3: PR boxes are sufficient to simulate all nonsignaling correlations with binary output (da = 
db = 2). 

By looking at Table II one can see that, if Alice and Bob share a nonlocal distribution {gx,gy > 2), 
they have a PR box when restricting x, y G {0, 1}. This shows that a single copy of any nonlocal extremal 
distribution can simulate a PR box. Next, we present a protocol that allows Alice and Bob to simulate 
any distribution of the form described in Table II, by only using a finite number of PR boxes This 
protocol is based on an idea presented in Q . 

If 5a; = 5y = the distribution is local, and thus, it can be simulated with the protocol detailed 
in Section II. B without using PR boxes. When gx,gy > 2, however, such protocols cannot be used. 
Let us first describe how to make the simulation when Alice and Bob choose inputs x < gx — 1 and 
y ^ 9y ~ ^- Within this range of input settings the outcomes and by are locally random and they are 
either perfectly correlated {ox + by — Q mod 2), or anti-correlated (o^ + by — 1 mod 2). Equivalently, 
any distribution of the form defined by Table II for inputs x < gx — I and y < ffy — 1 is equally well 
defined by a function: 



F{x,y)=ax+by. (11) 

Throughout this section all equalities are always modulo 2 and thus we omit the specification '(mod 
2)'. Let us expand x and y in binary: x = {xiX2 ■ ■ ■ XnS)^ U = (?/i2/2 ■ • ■ Uny), where Ux = [log2 gx^ and 
Hy = [log2 5y] . The function F{x,y) can always be expressed as a polynomial of the binary variables 
xi, . . . Xn^ yny ■ Morc specifically, one can always write F{x, y) as a finite sum of products 

F(a;,y) = ^P,(a;)Q,(y), (12) 

where each Pi{x) is a polynomial in the variables {xi^X2, ■■■,Xnx}^ and each Qi is a monomial in the 
variables {j/i,y2, ■■■iVny}- The sum has at most 2"" terms, because there are 2"" distinct monomials in 
the variables {yi,y2, ■■■,Vny}- 

Let us describe the Protocol. Suppose Alice and Bob choose the input settings x < 5x ~ 1 and 
y ^ 9y — ^- Alice (Bob) evaluates the 2"" numbers = Pi{x) {si = Qi{y)) depending on the x (y) 
chosen. Then, Alice (Bob) inputs the binary number (si) in the i*"^ PR box and obtains the outcome 
Qi [bi). They do such operations for all i = 1, . . . 2"". Finally, each party computes its output of the 
simulated distribution {ax, by) by summing the local outputs of the PR boxes: 



2"B 2"y 

=1 



ax > ai by ■.= '^bi. (13) 



The protocol works because of the next chain of equalities: 

2"H 2"" 2"" 2"" 

F{x, y) = ^ Pi{x)Qt{y) ^ riSi = ^ (a^ + &i) = ^ + ^ bj ^ Ox + by. (14) 

i—l i—1 i—1 i—1 j — ^ 

To see the third equality, just recall that for each PR box Oi + bi = riSi holds. 

Let us now consider the case where Alice picks an input x > gx, she must then assign to Ox the 
corresponding deterministic value and analogously for Bob. One can see that the simulation protocol 
works for all values of x and y. 

A corollary of Result 3 is the following. Since quantum correlations are nonsignaling, the statistics of 
any two-outcome measurements experiment on any bipartite quantum state, can also be simulated with 
PR-boxes (as noted in the introduction, this result extends [lll|). 
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X 

y 





1 


2 





1/4 1/4 
1/4 
1/4 


1/2 
1/4 
1/4 


1/2 
1/4 
1/4 


1 


1/2 
1/4 
1/4 


1/4 1/4 
1/4 
1/4 


1/4 1/4 
1/4 
1/4 


2 


1/2 
1/4 
1/4 


1/4 1/4 
1/4 
1/4 


1/4 1/4 
1/4 
1/4 



TABLE III: An extreme point of the nonsignaling polytope for 3 input settings each with 3 possible outcomes. 
Each cell contains 9 probabiUties associated with the 3x3 possible outcome pairs. 

V. DISCUSSION 

In this paper we have given a complete characterization of the extremal nonsignaling bipartite prob- 
ability distributions with binary outputs. We have grouped them into equivalence classes under local 
reversible transformations. One can see in Table II that these extremal distributions have a more com- 
plicated structure than in the binary input scenario (|8I9I) . Nevertheless, if we consider purely nonlocal 
distributions {gx = dx and gy = dy) all the marginals are unbiased {Pa\x — Pb\y — V^) and they 
are easily defined by specifying which input pairs (x^y) have correlated outputs and which (s,y) have 
anti-correlated outputs. In more general cases the extremal distributions stop showing these simple 
symmetries. An example of this more complex structure is given in Table III. This is an extremal distri- 
bution for the case da = db = dx = dy = 3, which we discovered numerically (that this is extremal can 
be verified by using arguments similar to those in Part 3 of the Appendix) . Its corresponding marginals 
are not unbiased and there are also some input pairs, (x, y), for which, once the output of one party is 
fixed the outcomes of the other remain uncertain. 

We have shown that all extremal nonlocal distributions with binary outputs are interconvertible. We 
have also given a specific protocol to implement this interconversion. By looking at the asymmetric 
structure of the extremal distribution in Table III, one sees that this protocol is not directly applicable 
in general. In particular it is an open question whether this distribution can be simulated by PR boxes. 
We conclude by noting that, just as treating the singlet as a unit of entanglement motivated numerous 
resource based questions (asymptotic interconversions, multipartite scenarios, etc) so too there is an 
analogous set of unanswered information-theoretic problems involving units of nonlocality. 

Note added. After the completion of this work, the authors were made aware that similar results have 
been obtained by J. Barrett and S. Pironio p^. 
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APPENDIX 

In this appendix we give the proof of Result 1. Firstly we show that any nonsignaling distribution can 
be expressed as a convex combination of distributions equivalent to ones of the form given in Table II. 
Secondly we show that all distributions of the form given in Table II are extremal. 

Some simple definitions will prove useful throughout this appendix. The word 'cell' refers to the set of 
four outcome probabilities Poo|a:i;: PiQ\xyj Poi\xyy Pii\xy associated with the input pair {x,y). It will be 
useful to think of Pab\xy a-s a table of cells with dx columns and dy rows where, associated with each entry 
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of the table, {x,y), there is a cell of four probabilities (See TableOJ. We say that a cell has 'one zero' if 
it has at least one of the four entries it contains set to zero. We call Paix^ Pb\y '^(^j 2;, y the 'marginals'. 
Specifically we define Pa=o\x=i = k and Pb=o\y=i = ^i- We now sketch the strategy adopted for Parts 1 
and 2 of the proof. 

An arbitrary distribution P^^^ is expressed as a convex combination of two distributions: 

= AiPi^'' + (1 - Ai)F2*'\ (15) 

We require that the new distributions, P^'^'' and ^2^^ have one more entry of their tables set equal to 

zero. Next we select one of them: P^^^ or P2^K We then repeat the above decomposition for the selected 
distribution. A schematic of the approach is: 

p(i) 

p(2) 
p(2) 
p(3) 
p(3) 

p(F) 

where the P*^*^ are probability distributions Pab\xy expressed as vectors and \i £ [0,1]. At each step 
a distribution with one more entry set to zero is selected. It may happen that a distribution P'*) will 
already have a zero at the position demanded in the next step (e.g. if P^*^ is already extremal). The 
expression k G {1,2} (e.g. in Eqs. H17ll9|l 'l indicates that the consecutive steps of the proof hold 
independently of which of the two distribution is chosen. The procedure stops when the new distribution 
chosen, P^^ , is equivalent to one of the form given in Table II. We will see that this procedure, based 
on successive zeroing of entries, always finishes. 

The proof of Result 1 is in three parts. 

• In Part 1 we show that any probability distribution can be expressed as a convex combination 
of probability distributions which have at least one zero in every cell and which have the same 
marginals {Pa\x, Pb\y, Va, 6, a;,y) as the original distribution. 

• In Part 2 we show that any probability distribution with one zero in every cell can be expressed as 
convex combinations of probability distributions which are locally equivalent to Table IITI 

• In Part 3 we show that all distributions of the form defined in Table UTI are extremal. 

Part 1 

This Part is broken in two. We first show that every cell can be expressed as a convex combination 
of two cells which satisfy the following constraints. Each has the same marginals as the original cell and 
also has at least one of their four entries set to zero. The nonsignaling conditions (|3I4() mean that all 
cells in the same column, x, have the same marginals Pa=a\x — ^x, Pa=i|a; = 1 ^ ^k- Consider a cell with 
marginals Ix > my > i. Given and niy, there is one free parameter, c, needed to completely specify 
the cell {x, y): 



c 




my — c 


Ix - C 


1 - 


he- m,y - Ix 



(22) 



By the above notation we mean: PQo\xy — c; Pio\xy — my — c; Poi\xy = Ix — c; Pii\xy = 1 + c — my — 



= AiP« + (l-Ai)pW, (16) 

= pW, fci e{l,2}, (17) 

= A2Pf' + (l-A2)Pf , (18) 

= Pif, fc2e{l,2}, (19) 

= AaPf + (1 - A3)Pi'\ (20) 

^ XpPP + {I - Xp)P^''\ (21) 
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If we require the positivity of the four elements of the cell, then c G [uiy + — 1, rriy]. One can readily 
check that all cells with allowed values of c can be written as convex combinations of the two cells where 

^ 1 ■ 



niy and c 



c 




niy — c 


Ix- C 


1 - 





xy 



my 







l-lx 



= A + (1 - A) (23) 



xy 




Instead of Ix > my > i, cells can satisfy different inequalities e.g. my > > 5 or my > ^ > Ix- Using 
symmetries, one can see that, whatever inequalities are satisfied, any cell can be expressed as a convex 
combination of two, one zero, cells in a similar manner. 

In the second half of this Part we generalise from single cells to the whole distribution. An iterative 
procedure of the form described in Eqs. H16I21|1 can be applied. Any distribution, P^^\ can be expressed 

as a convex combination of two distributions, P^^"^ and P2^\ Both P^*-^-* and Pj^'' have all cells equal to 
P^^\ except for the cell {x — 0,y — 0). This cell has the same marginals as in P^^^ but also has one 
more zero. Pi^"^ (or P2^^) can again be expressed as a convex combination of two distributions P^^^^ and 



P:f' which are identical to P£ 



(orpW) 



they also have (x = 0, y = 0) as a one zero cell - except that 



they also both have (a; = 0, y = 1) as a one zero cell (with the same marginals). This procedure can be 
extended to all cells until the final step has a probability distribution which is a convex combination of 
two distributions which have at least one zero in every cell. It follows that any probability distribution 
can be expressed as a convex combination of probability distributions which have at least one zero in 
every cell and which have the same marginals as the original distribution. 



Part 2 



In this part we show that distributions with one zero in every cell can be expressed as a convex com- 
bination of distributions equivalent to Table II. The argument exploits the fact that the only parameters 
describing distributions with one zero in every cell are their marginals. It considers first a cell (1), then 
a column (2) and finally a generic table (3). 



1. A Cell 



In this section we identify the constraints on the marginals in one zero cells. Consider a cell in the 
first column {x = Q,y — i) with the form: 




Note that the marginals are Pa=o|£c=o = ^0, Pb=[)\y=i — 
same marginals, Iq and TOj, if Pooioi = Oj instead of Pioioi 



(24) 



Positivity requires that Iq G [m^, 1]. For the 
: 0: 







(25) 



Oi, 



then ^0 G [0, 1 — mi]. If Poi|oj = instead then Iq E [0, m^]. Finally, if Pii|oi = then Iq E [1 — m^, 1]. In 
an arbitrary one zero cell, Iq will thus lie in one of four ranges: 

[0,1-m,], [0,m,], [m„l], [l-m„l], (26) 

depending on which of its four elements is zero. Part of the information in Eq. H26|l can be expressed as 
follows. We call vi the lower bound on Iq and the upper bound V2 {lo G [wi, W2]). Without knowing which 
of the four entries is zero in the cell, or even knowing the value of mi, we do know from Eq. H2t)|l that 
vi € {0, 1 — mi, mi} and V2 G {1, 1 — mi, mi}. This observation will be used in the ensuing subsection. 
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2. A Column 

In the following we use the constraints on Iq deduced in the preceding section (Eq. I26|l to express a 
column of a distribution's table as a convex combination of two simpler columns. Recall that all cells in 
column X (row y) have the same marginal P^lx {Pb\y) by Eqs. H3I4|) . Each probability distribution has 
dy cells in the column a; = 0. If there is one zero in every cell of the column, there will be dy overlapping 
ranges (see Eq. (12611 ') in which Iq can lie (while keeping all other marginals, nii, constant). It is possible 
that Iq will be uniquely determined by these ranges (e.g. if cell (0, 1) requires € [0, mi] and cell (1, 2) 
requires G [1^2, 1] and mi = ^2). One knows that there is at least one value of Iq consistent with 
all ranges, but generically, Iq will lie in a range of the form Iq G [u^^"*, Uj^'']. Here u^i'^ is the largest 
lower bound on Iq and ^2^'' the smallest upper bound, with u'^^^ G {0, 1 — mg, 1 — mi..., mg, toi...} and 
u^"^ G {1, 1 — mg, 1 — mi..., mg, mi...}. An arbitrary distribution, P^^\ with marginal Pj^^l^^^^Q = will 
have ^g G [ui^-* , Uj^-*] {u'"i \ as defined previously). One can check that it can always be expressed as 
a convex combination of two distributions p[^^ and Pi^"* with P^"* „! n = = wi^^ and Pi^"* „, „ — 

L z I a—0\x—{J ^ i 2 a—{J\x—{) 

Iq = u^2^ respectively. 
Example: 



mg 







ma 







mg 





Iq - mo 


1~Iq 







1 - mg 




m2 - mg 


1 - m2 





mi 


= A 





mi 


+ (1-A) 





mi 


^0 


I — Iq — mi 




mg 


1 — mg — mi 




m2 


1 — m2 — mi 


^0 


m2 — ^g 




mg 


m2 - mg 




m2 








1 - m2 







1 - m2 







1 - m2 



Above is a simple example of the procedure described. Without knowing the specific values of mg, mi 
and m2, and without even looking where the zeros are in each cell of the column, we do have the 
basic knowledge that ^g G [mJ^^Mj^^] where u^^^ G {0, 1 — mg, 1 — mi, 1 — m2,mg,mi,m2} and Uj^-* G 
{1,1 — mg, 1 — mi, 1 — m2, mg, mi, m2}. By looking at this specific case we can now refine our bounds 
on Iq. In the following we suppose, as an example, that 1 — mi > m2. From the cell (0,0) on the 
left hand side of Eq. (|27|l we know, by positivity, that Iq G [mg, 1]. From the cell (0, 1) we know that 
Iq G [0, 1 — mi]. From the cell (0, 2) we know that Iq G [0, m2]. Taking the largest lower bound and the 
smallest upper bound from these ranges, and recalling that 1 — mi > m2, one finds that Iq G [mg,m2]. 
The left hand side of Eq. H27|l can be expressed as a convex combination of two columns where Iq = mg 
and Iq — m2 and the m^ are kept constant. Note that each of the two columns on the right hand side 
of Eq. I|27|) contains a cell which has two zeros. These two columns each have one more zero than the 
column on the left hand side. 



3. Generalizing 

In this subsection we provide a procedure which shows that any distribution with one zero in every 
cell can be expressed as convex combinations of probability distributions which are locally equivalent to 
Table II. We first provide the loop of the procedure and second the condition for its termination. This 
approach is effectively a generalization of the decomposition of the column given in the preceding section. 
From Part 1 it is sufficient to consider only distributions which have one zero in every cell. 

Loop: The loop considered is of the form described in Eqs. (|16I21I) : (I) A distribution is expressed 
as a convex combination of two simpler distributions (II) one of these distributions is selected and then 
becomes the distribution in step (I) - the loop then continues. 

In what follows we will follow the loop through two cycles. 



11 



(I) A starting distribution P^^^ will have Iq constrained to lie in a range /q G M2^''] where u^'^ € 

{0, 1 — mo, 1 — mi..., mo, mi...} and Uj^'' G {1,1 — mo, 1 — mi..., mo, mi...}. It can be expressed as a 
convex combination of two distributions, P^^"^ and ^2^"*- These satisfy the further constraints that 
■^1 a=o|x=o = ^0 = ^1^^ and Pj^J^qi^^q = Iq = respectively. (This implies that P^;^^ and Pj^^ 
each have one cell which has two zeros in their a; = columns.) 

(II) The distribution P^l\ ki G {1,2} is chosen as P'^) 

(I) P(2) ^iii i^ave lo = u^^ or There is now one less parameter in the table because two of the 

marginals have been related. Iq = Wj,^^ will also be constrained to lie in a new range lo S [wi^\ ^2^^] 
where u^^^ G {0, 1 — mo, 1 — mi..., mo, mi...l — Zq, 1 — Zi..., Zq, Zi...} and ^2^^ G {1, 1 — mo, 1 — 
mi..., mo, mi...l — Zo, 1 — Zi..., Zo. ^i...}. P*^^^ can be written as a convex combination of a distribution 
Pf \ with Zo = u[^^ = u[^\ and pf ' with la = u[\^ = u^^^. 

(II) The distribution P^^^\ k2 G {1, 2} is chosen as P^^). 
(I) ... 

Depending on the choices made at each step (II) the procedure creates distributions satisfying a chain 
of equivalences between their marginals: 

zo = 4? = 4? = 4? = ... = 4^\ (28) 

which will bo specified by the string (Zci, Zc2, ...Zj^?) with ki G {1,2}. The nature of will bo discussed 

as part of the termination conditions. Noting which sets the m^*' and are chosen from, a chain of 
equivalences could, for example, be of the form Zq = m2 = 1 — mg = Zi = ... . Note that after each cycle 
the distributions have one less parameter as more and more of their marginals are related to each other. 
The procedure shrinks the number of free parameters as it converges towards extreme points (these have 
no free parameters). 

The first equivalence in a chain of equalities can only be Zq = m„ or Zq = 1 — for some n (the Zq = 
or 1 case will be discussed as part of the termination conditions). This is explained by noting that in the 
first cycle Zq is constrained by cells in the same column (see the preceding subsection). It follows that 

u^^ lies in the set {0, 1 — mo, 1 — mi..., mo,TOi...} and u'^^ lies in {1, 1 — mo, 1 — mi..., mo, mi...} which 
only depend on the values of the mj. After the first cycle in which Zq = m„ or Zq = 1 — m„, the cells 
in both column {x = 0) and the row {y = n) will provide constraints on Zq. This is because the cells in 
row {y = n) all depend on m„. With some thought, one sees that in general i/f"^ will thus lie in the set 

{0, 1 — mo, 1 — mi..., mo, mi...l — Zo) 1 — Zi..., Zo> Zi...} and ^ in {1, 1 — mo, 1 — mi..., mo, mi...l — Zo) 1 — 
Zi..., Zq, Zi...} and these depend on both m^ and Zj. 

Termination conditions: We now discuss loop termination. It terminates, after F cycles, in two distinct 
ways. 

(a) When u'jfj = or 1 

(b) When u^^J = 1 - u^^^ for g < F. This is only satisfied if u^^J = 1/2. 

An example of case (b) would be Zq = m2 = 1 — me = h = ... = 1 — m2, which implies that all of these 
numbers must be 1/2. 

After the loop terminates, several marginals from the set of all Z, and m, will have been set to cither 
0, 1, or 1/2 (the procedure as a whole always terminates, as there are a finite number of marginals to 
be equated). If there exists a set of marginals which have not been fixed to one of these values, a new 
marginal Z^ (or mi) from this set can be chosen. The form of the above loop can then be repeated by 

studying constraints on this new variable. 

By repeating this procedure, all marginals, mj and Zj, will be absorbed into a chain of equalities 
terminating in 0,1 or 1/2. A probability distribution equivalent to Table II will be the only possible 

result. It will generally be necessary to perform some local relabelling to obtain distributions of the form 
of Table II. For example, the outcomes for all deterministic input settings have to be fixed to '0'. 
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Part 3 

The following proves by contradiction that all distributions of the form defined in Table^]are extremal. 
Suppose that a particular distribution P^^^ of form defined in Table is not extremal, ft can thus 

be expressed as a convex combination of more than one distribution. Positivity requires that these 

( j^) 

distributions have a zero where has a zero. 

Suppose, from Table ^ that P^^^ has gx = 9y — ^ then all of its cells have three zeros. This 
distribution cannot be expressed as a convex combination of two distinct distributions with the same 
zeros, since normalization fixes the fourth entry of each cell to be one. P-l is the only distribution with 
these zeros. 

If P'l^'^ has ffi:, ffj, > 2 it will have some cells with three zeros (if g^,; < dx and gy < dy) and some with 
two zeros. As noted above, the three zero cells have their fourth entry fixed by normalization. A study 
of the distribution of zeros in the four cells («, j), i,j G {0, 1} shows that all remaining non-zero entries 
are forced to be one-half. P{ is the only distribution with its particular distribution of zeros. 
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